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Abstract 



when selection is acting on a large genetically diverse population, beneficial alleles increase in frequency. This fact can be 
used to map quantitative trait loci by sequencing the pooled DNA from the population at consecutive time points and 
observing allele frequency changes. Here, we present a population genetic method to analyze time series data of allele fre- 
quencies from such an experiment. Beginning with a range of proposed evolutionary scenarios, the method measures the 
consistency of each with the observed frequency changes. Evolutionary theory is utilized to formulate equations of motion 
for the allele frequencies, following which likelihoods for having observed the sequencing data under each scenario are de- 
rived. Comparison of these likelihoods gives an insight into the prevailing dynamics of the system under study. We illus- 
trate the method by quantifying selective effects from an experiment, in which two phenotypically different yeast strains 
were first crossed and then propagated under heat stress (Parts L, Cubillos FA, Warringer J, et al. [14 co-authors]. 2011. Re- 
vealing the genetic structure of a trait by sequencing a population under selection. Genome Res). From these data, we dis- 
cover that about 6% of polymorphic sites evolve nonneutrally under heat stress conditions, either because of their link- 
age to beneficial (driver) alleles or because they are drivers themselves. We further identify 44 genomic regions contain- 
ing one or more candidate driver alleles, quantify their apparent selective advantage, obtain estimates of recombination 
rates within the regions, and show that the dynamics of the drivers display a strong signature of selection going be- 
yond additive models. Our approach is applicable to study adaptation in a range of systems under different evolutionary 
pressures. 
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Introduction 

Fitness differences between individuals enable natural selec- 
tion to increase the frequency of beneficial variants within a 
population over time. The specifics of this process, however, 
are often complex, with the fitness difference conferred by 
a variant potentially depending on time, space, the genetic 
background of the individual, the genotypic composition of 
the population, and other species in the vicinity (Hartl and 
Clark 2007). Furthermore, other evolutionary forces such as 
mutation and genetic drift contribute to allele frequency 
changes, and their effects can mask those arising from dif- 
ferences in fitness. 

For these reasons, the importance of studying the fitness 
effects of mutations and evolution in controlled laboratory 
settings is well known. One of the most celebrated of these 
experiments is the "£. coli long-term evolution experiment," 
which so far covers over 50,000 generations (Woods et al. 
2011). Deep sequencing data from this and other experi- 
ments have provided an unprecedented level of insight into 
the processes of molecular evolution (Barrick et al. 2009; 
Burke et al. 2010; Hietpas et al. 2011). However, despite this 
progress in the quantitative study of evolution, fundamen- 
tal challenges still remain. 

A first challenge in studying fitness effects isthetimescale 
it can take for evolution to increase the frequency of a 



mutation. Consider for instance, a hypothetical variant in 
Escherichia coli with a fitness advantage of ctq = /mutant — 
/wiidtype = 10^^. Supposing the variant to have survived ge- 
netic drift, and neglecting the effects of other mutations, 
in a population of size N = 10^, the variant would take 
~10^ generations to fix, representing hundreds of years 
of evolution. While in population genetic terms, such a 
variant would be considered strongly beneficial (with the 
ratio of timescales associated with genetic drift and se- 
lection, (jflN ^ 1); in a laboratory setting, it would 
be barely detectable within the lifetime of the researcher. 
For this reason, knowledge of fitness effects of that size 
is derived via analysis of intra- and interspecies variation 
(Sawyer et al. 2003; Eyre-Walker 2006; Eyre-Walker and 
Keightley 2007; Mustonen and Lassig 2007; Sella et al. 2009). 

A second challenge is posed by the mutation rate, which is 
often small, such that variance within a population is created 
slowly. The low initial frequency of a new mutant renders 
even strongly beneficial mutations susceptible to elimina- 
tion through genetic drift, while, as noted above, mutants 
escaping drift take time to grow to detectable frequencies. 
Waiting for a well-adapted initial population to "find" 
beneficial mutations, and for these to become visible, may 
require long-term experiments. Use of mutagens (Weigand 
and Sundin 2009) or increasing the overall number of 
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individuals in the population can alter the number of 
mutations entering the population (Perfeito et al. 2007). 

One means of increasing the rate of adaptation is to ap- 
ply artificial selection on a system by imposing environ- 
mental stress (Kishimoto et al. 2010; Bell and Gonzalez 
2011) or nutritional restriction (Kao and Sherlock 2008). 
A recently described method, examining heat tolerance 
in yeast, combines this approach with the addition of ar- 
tificial variance generated by genetic crosses (Parts et al. 
201 1), crossing two strains of yeast, and propagating the re- 
sulting population asexually under heat stress conditions. 
Conceptually, this factorizes the evolutionary dynamics: 
Mutations have accumulated over a long time period since 
the last common ancestor of the parental strains; recombi- 
nation during the two-way crossing protocol ensures that 
variants not in close proximity are reasonably unlinked and 
at substantial frequencies, and strong selective pressure can 
be applied. As the population adapts to the selected condi- 
tion, multiple changes in allele frequencies can be observed. 
The process can be traced at a single nucleotide resolution 
by deep sequencing the pooled DNA from the population 
at consecutive time points. 

This concept, of applying artificial selection to an arti- 
ficially mixed population in order to identify quantitative 
trait loci (QTLs), has previously been applied to the malaria 
parasite (Culleton et al. 2005) and to yeast (Segre et al. 
2006; Ehrenreich et al. 2010). As performed in the instance 
analyzed here (Parts et al. 2011), the method offers the 
advantages of a high genomic resolution and, extra to pre- 
vious studies, time-resolved sequencing data, with allele fre- 
quencies measured at four distinct time points. In order to 
fully exploit the time series aspect of these new data, further 
methodological development is required. 

In this paper, we develop an approach, based on 
population genetic theory, to investigate data from such 
an experiment. Examining changes in the frequencies of 
segregating alleles over time, we derive measurements 
of the fitness effects that different alleles confer for 
heat tolerance and identify the presence of nonaddi- 
tive fitness effects. As the data we analyze are from the 
asexual version of the experiment, the qualitative pic- 
ture of the dynamics is simple. During the crossing, a 
large pool of recombinant genotypes are created. Un- 
der selective pressure, the genotypes acquire some un- 
known haplotype fitness distribution, which leads to a 
relative proliferation of the fitter, at the cost of the less fit, 
genotypes. However, to obtain a quantitative picture of the 
dynamics is challenging: Sequencing of the pool gives only 
data of allele frequencies, not of genotypes. As will be shown, 
success depends on several factors, the most important be- 
ing the population size. Another key issue is whether the 
initial pool contained enough variation in the high fitness 
part of the population for no single clone to dominate the 
population by the end of the experiment; if this were the 
case, changes in allele frequency would not allow for dis- 
crimination between selected and nonselected alleles in the 
clone. Here, we perform our analysis from the perspective of 
alleles but go on to use the allele picture to seek evidence of 



more complex selective scenarios such as those acting on 
genotypes. While in this case, the evolution is clearly driven 
by haplotype selection (which we can, however, analyze 
starting from an allelic viewpoint), for a sexually propa- 
gated population, an important consideration is whether an 
"allele selection" or a "genotype selection" mode of evolu- 
tion dominates (Neher and Shraiman 2009). 

In focusing on standing variation generated by the cross- 
ing of strains, important questions concerning de novo 
mutation processes are not addressed. Simultaneously, we 
recognize that it is not clear how important such ex- 
treme selection pressures are for organisms over macroevo- 
lutionary timescales. However, experiments such as this 
undoubtedly provide an exciting opportunity to study 
strong fitness effects, both quantitatively and systematically, 
at the molecular level. Here, we address this opportunity to 
quantify the fitness effects acting on the heat tolerance trait 
in yeast within a set of more than 30,000 segregating sites. 

Materials and Methods 

Trajectory Probabilities 

Given observations of allele counts at a locus /', we can 
write the corresponding trajectory probability under some 
evolution model Ai: 

T 

r{n^\M) = llP{n;{t)\NKt).M), (1) 

t=to 

where P is binomial probability distribution, Nf (t) denotes 
number of draws, that is, sequence read depth at locus / 
at time t, and the true underlying population frequency 
of the allele q"{t) depends on the model Ai, which fixes 
the time evolution of the system. The evolution of q"{t) 
can be influenced by alleles at other loci depending on the 
specifics ofM. We can then write the probability for the to- 
tal observation given Ai, which we, to serve as an example, 
take here to be the driver and passengers model defined in 
equations (4-7): 

logV{n\M;) = logV{n;\Mf'''-) 

+ ^logP(n,|Aip,Mf"), 

Mi 

(2) 

where A^i = {Mf'-.M"'"' }, Mf'- = {a„ p,q^{to)}, 
and AlP^"- = {q^{to)}. Thus, we can form a log-likelihood 
score of the model A^, given the observation: 
£{Mi\n) = logr{Mf'''\ni) 

(3) 

This is the function underlying our maximum likelihood 
inference. The structure of the log-likelihood is such that 
the driver ; (index /' also to be learned) influences every pas- 
senger trajectory ], but given the driver dynamics, we can 
independently maximize likelihoods of the passenger 
trajectories, which only have their initial conditions (and if 
so desired their linkage to the driver, D|j) as free parameters. 
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Overview of the Experiment 

Two diverged strains of Saccharomyces cerevisiae, North 
American (NA) and West African (WA), were crossed 
for 12 generations to create a large pool of segregants (Parts 
et al. 2011). After the crossing, the pool was put under 
heat stress (40 °C) for a period of T = 288 h with re- 
plating (after mixing) of 10% of the pool every 48 h, DNA 
from the remainder of the pool being sequenced at time 
points to = 0 h, ti = 96 h, tj = 192 h, ts = 288 h. 
Estimating the number of generations (gen) that T corre- 
sponds to is difficult however, it should in any case be less 
than 10^ gen. In order to avoid the need to convert the unit 
of time from hours to generations (which would be the 
more natural choice in terms of population genetics), we 
here measure rates in units of 1/ (96h). During the selection 
protocol, the population size N varied from ~10^ after 
each replatingto ~10^ before replating. In the following, we 
denote the NA allele with index 0 and WA allele with index 
1. Substantial allele frequency changes were observed over 
the course of the experiment ( Parts et al. 2011). 

Modeling the Observed Time Evolution 

Timescales of the Processes 

A simple calculation suggests the likely role of genetic drift 
and de novo mutations in the selection process to be neg- 
ligible. We first note that the duration of the experiment, 
T, is between five to six orders of magnitude smaller than 
the population size, N. As such, genetic drift, which changes 
allele frequencies at timescales of '^N generations (Kimura 
1964), most likely has very little effect across the course 
of the selection experiment. Considering next the possi- 
bility of de novo mutation, the mutation rate is known 
to be small (Drake et al. 1998; Lang and Murray 2008; 
Lynch et aL 2008). Estimating a worst-case scenario by mod- 
eling the deterministic growth of a mutant at the out- 
set of the experiment, we note that selection pressures of 
~l/Tlog (N/N^) are required to reach frequency ^/N^ 
within time T (as can be derived from eq. 8), where 
is the mean sequencing read depth. This means that the 
variant would need to have ~0.03 growth rate advantage 
per hour to reach detection threshold during the experi- 
ment (for this data, ~ 10^). We contend that muta- 
tions with so large an effect are likely to be extremely rare 
and assume that changes in allele frequencies are driven by 
population variation existing in the initial population (we 
demonstrate later that de novo mutations do not play a 
substantial role directly from the data). Data from a bio- 
logical replica of the experiment supported this conclusion 
(Parts etaL 2011). 

In order to analyze the allele frequency changes observed 
in the experiment, we considered a variety of scenarios 
described by deterministic evolutionary dynamics in con- 
junction with a stochastic sampling process resulting from 
finite sequencing depth. For the reasons argued above, the 
evolution of the system was taken to be driven by fit- 
ness differences between segregating alleles in the initial 
population. 



Driver and Passengers 

We consider first a system with a single driver at locus ;' with 
two possible alleles a G {O, l}, influencing all passenger 
mutations which are in linkage disequilibrium with it. We 
recall from deterministic single locus theory that the driver 
evolves according to the equation (Hartl and Clark 2007): 

dq^/dt = aiq]q°. (4) 

where the frequency of the allele 1 is denoted with q], 
{q° = ^ — q]), and the selection coefficient a, equals the 
Malthusian fitness difference/j' —jf between the alleles. 

The general mathematical framework to compute the 
effect of a deterministic driver on linked neutral variation 
has been introduced in classical work on genetic hitchhik- 
ing (Smith and Haigh 1974; Barton 2000). Here, we use the 
case with zero recombination because during the artificial 
selection phase, the population evolves asexually and no fur- 
ther recombination takes place. Therefore, given the time 
evolution of the driver, we can write down equations of 
motion for the passengers: 

oe{o,i} ^' ^ 

where tj^'' denotes a two locus haplotype at loci i,j with al- 
leles a, fa € {0, 1 }. Equation (5) follows simply by noticing 
that the passenger locus has by definition zero selection (its 
alleles would stay at their initial frequency without linkage 
to the driver), such that the passenger's initial linkage to the 
driver dictates its motion. We note that the two locus hap- 
lotype frequencies at the initial pool can be written in terms 
of allele frequencies and linkage disequilibrium: 

qf{to)=q^{t,)q^{to) + {-^r'D;J. (6) 

As such, the dynamics of the passengers given the driver are 
fully fixed by D,^. Values of Djj can be inferred directly for 
each locus or parametrized in terms of the recombination, 
which took place during the crossing: 

Di;(p,Aj) = d;^(i -pAJ)''"°™^ (7) 

where D,-; = min{q°(o)qj^(o),qj(o)c?°(o)} is the maxi- 
mum linkage disequilibrium attainable, Ajj denotes the dis- 
tance between the loci in base pairs (bp), p measures the 
recombination rate during the crossing process in units of 
1 / (bp X gen), and Ncrossing denotes the number of crossing 
rounds (for pAjj > 1, we set 0,^ = 0). Equation (7) as- 
sumes an infinite population size but is nevertheless a good 
description for the system due to the large number of indi- 
viduals in the initial pool. (See supplementary text. Supple- 
mentary Material online, for analysis of finite populations 
by means of computer simulation. The population size re- 
quired to correctly decide whether a marker moved due to 
linkage to a nearby selective sweep or just due to drift has 
been calculated; Logeswaran and Barton 2011.) 

In a case where full sequences from the initial pool were 
available, the initial linkage pattern could be included di- 
rectly by measuring the linkage, circumventing equation (7), 
which from the inference framework most critically relies on 
a large population size. 
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Fig. 1. Genome-wide view under unlinked analysis, (a) Ordered score differences AC (see Materials and Methods) between the unlinked additive 
drivers and the neutral model. Blue dots show values for all trajectories, whereas sets of red dots correspond to individual chromosomes (the 
uppermost is chromosome XV and the second highest 11; see supplementary figs. S2 and S3, Supplementary Material online, for individual chro- 
mosomes), (b) Global histogram of selection coefficient calling only loci for which AlC ^ 10. The negative mean selection a — —0.36 reflects 
that the NA allele is more often better adapted for the heat stress condition than the WA one. 



Liberal-Driver and Passengers 

We also consider models, in which drivers are allowed to 
take any trajectory, that is, in which their dynamics are 
not parametrized, and hence constrained, by the equation 
of motion given in equation (4). For these "liberal-drivers" 
the passenger dynamics again follow equation (5). We fur- 
ther evaluate time-dependent selection coefficients for such 
trajectories; from equation (4), we get for each time interval: 



(8) 



tk+^ — h- Such dynamics could result from 



where At^ 

externally driven truly time-dependent selective pressures, 
or as yet unidentified internal interactions. Internal inter- 
actions could result from linkage to other drivers, epistatic 
fitness interactions between the locus and other loci (e.g., 
genotype selection), frequency-dependent selection, or a 
combination of all these factors. We return to the interpre- 
tation of these drivers later. 

The evolutionary equations outlined in this section were 
applied within a standard maximum likelihood framework 
(see Materials and Methods) to explore the observed time 
evolution of the allele frequencies. 

Results 

Prevalence of Nonneutral Trajectories 
Close to 6% of the segregating sites across the genome 
were identified as evolving in a significantly nonneu- 
tral manner. To detect nonneutral behavior, we calcu- 
lated likelihoods for each trajectory under two mod- 
els; the first assuming that they evolved neutrally, with 
any motion reflecting noise from the finite depth of 
sampling, and the second assuming that they evolved 
independently under selection as described by equa- 
tion (4). For each locus, this gave maximum likelihood 
predictions for the allele frequencies, corresponding like- 
lihood scores, and for the second model, a trajectory- 
specific selection coefficient. Global statistics for the 
likelihood differences between models, and the identified 
selection coefficients, are reported in figure 1. Applying 
the Akaike information criterion (AlC) (Akaike 1974), the 
fraction of loci identified by this comparison as being 



not consistent with neutral evolution (AlC difference ^ 
10; Burnham and Anderson 2002) was 0.058 (see supple- 
mentary text and fig. SI, Supplementary Material online, 
for analysis of trajectories from a control experiment in- 
dicating false positive rate less than 1% and supplemen- 
tary figs. S2, S3, and S5, Supplementary Material online, for 
chromosome-specific breakdown). The average selection 
coefficient was —0.36 (standard deviation [SD] 0.63) imply- 
ing that the NA allele, rather than the WA one, is more of- 
ten the beneficial allele under the imposed condition of heat 
stress. The choice of which of the two alleles has frequency 

is arbitrary and sets the direction of selection (see eq. 4). 

This simple protocol of assigning a selection coefficient 
to each locus provided an instant genome-wide view of the 
data set, allowing for the rapid identification of genomic re- 
gions of interest. Using the log-likelihood scores generated 
under each model, we identified 44 candidate driver foci for 
further examination (see supplementary text. Supplemen- 
tary Material online). 

Including Linkage — Driver and Passengers 
A driver-passenger model (eqs. 4-7), which takes the link- 
age between nearby loci into account, gave a substantially 
better representation of the evolution of the system in all 
the 44 candidate driver foci than the unlinked driver model. 

A particularly intuitive example is provided by the driver 
in chromosome II reported in figure 2. Although the additive 
unlinked drivers model explains the main part of the sweep 
region well, it fails to account for the motion observed 
around 475 kb (fig. la). This failure is easy to understand — 
the observed allele frequencies move substantially during 
AtQ = ti — to but stay at almost the same value during At-\ 
and Atj — a motion incompatible with the family of curves 
parameterized by equation (4). While an unlinked driver 
allele, at frequencies close to 0.5, moves at roughly con- 
stant velocity towards fixation or death, an allele frequency 
changing through linkage may fix at intermediate values. 
Hence, taking model complexity into account, a model of 
a single driver and passengers (fig. lb), explains the changes 
in this region of the genome better. Most frequencies in the 
region move due to linkage, rather than through inherent 
selection. 
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Fic. 2. An example of unlinked additive drivers versus driver and pas- 
sengers models. Allele frequencies in the depicted chromosome II 
region moved substantially during the experiment, (a) Unlinked addi- 
tive drivers model w^here every trajectory follows equation (4): Thick 
dashed lines (to-red, t,-green, ti-blue, and ta-black) show the data and 
the corresponding solid lines the maximum likelihood predictions for 
the motion. The blue vertical line denotes the location of the largest a 
(for the full selection profile, see supplementary fig. S5, Supplementary 
Material online). The model explains the motion close to the sweep 
focus well; however, it fails qualitatively in the region ~475 kb. As ex- 
plained in the text, the motion in that region is not compatible with 
modes provided by equation (4). (fa) Driver and passengers model as 
parameterized by equations (4-7). The red vertical line denotes the in- 
ferred driver location and as can be seen the region ~475 kb is much 
better explained than by the unlinked additive drivers model. The dif- 
ference between the AlC scores was 466 in favor of the driver and 
passenger model, the large difference indicating the importance of 
including linkage between nearby loci in the model. Data shown are 
averaged over a sliding window of 5; however, all inferences are done 
with the raw data. 



Distributions of Selection Coefficients and 
Recombination Rates 

We inferred estinnates for driver selection coefficients for 
each of the candidate driver foci using the driver and pas- 
sengers model with linkage parametrized by recombination 
rate (fig. 3a). Furthermore, inference in each case gave a 
maximum likelihood estimate of the local recombination 
rate (fig. 3b). The inferred selection coefficient was nega- 
tive for 29 of the 44 regions (mean a = —0.2, SD 0.6), re- 
flecting the advantage conferred by the NA allele. A mean 
magnitude of selection of |cj| = 0.44 indicated that the 
drivers evolve under substantial selection. The maximum 
likelihood estimates for recombination rates (mean p = 
1.6 X 10"'^(bp X gen)"\SD 1.1 x 10"'') are consistent with 



estimates from the literature (Ruderfer et al. 2006; Mancera 
et al. 2008). 

Liberal-Driver and Passengers 

In the majority of cases (38/44 regions), the liberal-driver 
model gave a significantly improved fit to the observations 
after allowing for the additional two degrees of freedom 
(having three cr, values compared with one, see supplemen- 
tary table SI, Supplementary Material online). In the ex- 
ample of chromosome II, discussed above, a single driver 
with a constant recombination rate appeared to explain 
the observed changes in frequencies very well. Nevertheless, 
the liberal-driver model explained the motion much better 
(gain of 166 units of log-likelihood, see supplementary fig. 
S4, Supplementary Material online). However, in chromo- 
some Xlll, the second example in which the identified driver 
(almost) fixed within the experimental time frame, a gain 
of only 3 units of score was achieved, favoring the standard 
driver and passengers model once degrees of freedom were 
taken into account. 

Apart from these events, other driver alleles were found 
at intermediate frequencies by the end of the experiment. 
This observation has two possible explanations. First, all can- 
didate drivers could evolve according to trajectories de- 
fined by equation (4) but with values of cr, too low to 
observe fixation during the length of the experiment. Sec- 
ond, candidate drivers could evolve according to some 
alternative equations of motion. Application of the liberal- 
driver model suggested the latter explanation to be correct 
for the majority of candidate drivers. While the liberal-driver 
model introduces two additional degrees of freedom, use 
of this model produced a mean score gain per region of 26 
(see supplementary table SI, Supplementary Material on- 
line), more than compensating for the gain in parameters. 
Figure 4 shows an example within chromosome XV where 
the liberal-driver model gave a substantial improvement. 

Averaged across the duration of the experiment, the 
selection coefficients inferred for the liberal-drivers, ex- 
pressed as in equation (8), were very similar to those 
shown in figure 3 from the driver and passengers model 
(mean —0.19 SD 0.68). Estimates of local recombination 
rates obtained using liberal-drivers were also very similar 
(mean 1.5 x 10"*^ SD 1.9 x 10"*^). To gain an insight 
to the underlying reason for the liberal drivers' superior 
log-likelihood scores, we studied the process at the level of 
individual time intervals At. 

Time-dependent selection coefficients were used to 
obtain point estimates of fitness flux, cf>, a measure of the 
amount of ongoing adaptation in the system at a given time 
point expressed as (/>(tfc) = Y^iedmers ^ii^i<)'^'^iM/^ti, 
(Mustonen and Lassig 2007; Mustonen and Lassig 2010). 
The estimates of fitness flux over the three measured time 
intervals differ noticeably between the two models (fig. 5a). 
Whereas, under the standard-driver model, most adapta- 
tion takes places in the first time interval, for liberal-drivers, 
the second time interval dominates. Most interestingly, 
under the liberal-driver model, the fitness flux almost 
vanishes for the last time interval, suggesting that by the 
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Fic. 3. Statistics of selection and recombination under the driver and passengers model, (o) Histogram of selection coefficients for the driver foci 
(mean ct = —0.2 SD 0.6). (b) Histogram of inferred recombinations rates for these regions (mean p = 1.6 x 10^^ SD 1.1 x 10^''). 



end of the experiment, the system had almost equilibrated. 
This point is further demonstrated by the distribution of 
the inferred motion of the drivers under both standard 
and liberal scenarios during the last time interval, shown in 
figure Sb. 

We suggest that the apparent equilibration observed 
here should be understood in the sense of separation of 
timescales, in that it represents the completion, w/ithin 
sequencing resolution, of the first phase of adaptation 
(due to the finite sequencing depth, we would not 
observe movement slower than < (read depth)^YZ\t). 
Given further propagation of the system, deviation from 
this equilibrium would be observed through the arrival of 
new mutations; however, as discussed later, we do not find 
evidence for these at substantial frequencies within the 
time frame of the experiment. 

The observation of candidate driver alleles reaching equi- 
librium at intermediate frequencies suggests the presence 
of interactions between drivers in different locations of the 
genome. An alternative scenario exists, in which drivers 
do not interact with one another but instead evolve with 
genuinely time-dependent selection coefficients. However, 
while behavior of this kind might be observed in response 
to changes in the external environment, the consistency 
of the experimental conditions suggests its occurrence 
here to be unlikely, interactions between drivers being the 
most likely source of the observed allele frequency changes. 
Different effects leading to interactions between drivers are 
discussed later. 

New Beneficial Mutations 

We contended earlier, based on a simple calculation, that de 
novo mutations are unlikely to play a substantial role for the 
allele frequency dynamics during the experiment because 
they would need to carry fitness advantage > 3%/ h to be de- 
tectable during the experiment. However, in the measured 
spectrum of fitness effects in figure 3, we observe selection 
coefficients as large as to be w2.4%/ h, raising the possibil- 
ity that novel mutations might have beneficial effects strong 
enough to have a measurable effect. We thus took further 
steps to rule out this possibility. 

As there is no recombination during the selection pro- 
tocol, any de novo mutation would lead to a global 



perturbation of the allele frequency dynamics, one of the 
two alleles at each segregating site being fully linked to 
the new mutation. This would lead to a large effect. 
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Flc. 4. An example of standard-driver versus liberal-driver models. Al- 
lele frequencies in the depicted chromosome XV region moved sub- 
stantially during the experiment. Thick dashed lines (to-red, ti -green, 
t2-blue, and ta-black) show the data and the corresponding solid lines 
the maximum likelihood predictions for the motion; vertical lines de- 
note the inferred driver location, (o) Driver and passengers model: The 
model explains the motion overall quite well; however, it fails qualita- 
tively near the sweep focus at 172 kb. The motion, which, as is evident 
by the visible overlap between the blue and black dashed lines, seems 
to reach equilibrium at an intermediate frequency, is not compatible 
with models provided by equation (4). (fa) Liberal-driver and passen- 
gers model: The trajectories near the focus are much better explained 
than by the standard-driver model. The liberal-driver interpretation 
gives a gain of 79 units in log-likelihood at a cost of two extra degrees 
of freedom and is thus strongly supported statistically. Data shown are 
averaged over a sliding window of 5; however, all inferences are done 
with the raw data. 
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Fic. 5. Adaptation under drivers versus liberal-drivers interpretation, (a) Estimates of fitness flux: In the standard-driver interpretation (eqs. 4-7), 
most adaptation took place during the first time interval, Ato, whereas liberal-drivers adapt most during Z\t, and have a higher total fitness flux. 
It is also apparent that while standard-drivers continue to sustain a fitness flux during the last interval At2, the flux generated by liberal-drivers 
has all but vanished, (fa) Inferred motion of the drivers (red) and liberal-drivers (green) during the last time interval supports the observation that 
under the liberal-driver interpretation the system has almost reached equilibrium (in the sense discussed in the text). 



increasing as a function of time, on the movement of 
the allele frequencies. We thus calculated genome-wide 
statistics of absolute changes in allele frequency at the 
three time intervals (i.e., \q{tk) — q{tk-^)\) from the 
trajectories inferred under the unlinked driver model. 
The statistics showed that the absolute movement of al- 
lele frequencies slightly decreased throughout the course 
of the experiment, with mean values of 0.032, 0.032, 
0.030, respectively, reflecting our inference that 94% of 
allele frequencies evolved in a manner compatible with 
neutral (no motion) evolution with the remaining 6% 
evolving as discussed. This decrease suggests that de 
novo mutations did not significantly affect the allele 
frequency dynamics during the time interval reported. 

Spatial Uncertainty of Identified Drivers 
Likelihood calculations suggested that, using the (liberal) 
driver passenger models, the precise location of a driver 
could be identified on average to within an accuracy of 12 
kb. Due to li n kage, alleles at nearby loci are li kely to move in a 
similar manner, leading to uncertainty in the precise choice 
of location of a driver locus. This uncertainty was quantified 
by taking the maximum likelihood drivers inferred under 
the standard- and liberal-driver models and checking 



the next 20 segregating sites both up- and downstream, 
assigning each in turn to be the driver, reoptimizing the 
remaining degrees of freedom, and calculating the resulting 
likelihoods. Figure 6a shows for standard-drivers that, 
using a log-likelihood cutoff of three units (95% confidence 
interval), the driver allele can on average be located within a 
window of 12 kb centered on the maximum likelihood loca- 
tion. This means that on average 38 segregating sites remain 
to be further studied for each driver focus (see supplemen- 
tary text. Supplementary Material online, for details how 
this number of segregating sites was calculated). However, 
region-specific uncertainty strongly depends on the selec- 
tive strength of the driver, local recombination rate, and 
on the allele read numbers from sequencing — variability of 
these factors leads to substantial variance in the accuracy 
of the inferred driver location (fig. 6a). Nevertheless, for 
regions with strong drivers, the small number of segre- 
gating sites to be further studied reflects the efficiency of 
recombination breaking linkage when advanced intercross 
lines are used in the experimental design (Darvasi and SoUer 
1995). Comparison of driver locations under each of the two 
models, shown in figure 6b, revealed differences that were 
mostly within the uncertainty range, with mean absolute 
distance ~2 kb. 




10 15 20 25 30 
Akb 



10 -5 0 5 10 15 
Akb 



Fic. 6. Driver location uncertainty, (a) Inferred sizes of the regions containing loci that when fixed to be the driver are within a distance of 3 units 
of score (95% confidence interval) from the maximum likelihood value (with reoptimization of the remaining degrees of freedom). Data shown 
are for standard-drivers. Variability in the magnitude of selection and local recombination rates leads to substantial variation in the uncertainty of 
the driver locations, (fa) Distances between the maximum likelihood driver locations inferred under the standard-driver and liberal-driver models. 
For most foci, the two models gave predictions agreeing within variability identified in panel {a). 
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Analysis of a Biological Replicate Experiment and 
Estimates of False Positive Rates 

One great advantage of artificial selection protocols is 
tfie opportunity to analyze biological replicates to gauge 
robustness of inferences made. We analyzed data from 
a biological replicate experiment and show that inferred 
statistics of selection replicate well (see supplementary text, 
Supplementary Material online, for details). However, as the 
replicate data set had fewer time-points, and was thus not 
fully comparable to the one discussed so far, we performed 
computer simulations to estimate false positive rates for 
the analyzed 44 driver set. This resulted in an estimated 
false positive rate for our driver region detection to be less 
than ~2% for populations of size 10^. In these simulations, 
we chose the number of segregating sites, number of drivers 
and their estimated selection strengths, and recombination 
rates to mimic the ones found from the experimental data 
(see supplementary text. Supplementary Material online, 
for details). 

The 44 regions called in our analysis as containing driver 
loci are significantly larger than the more conservative 21 (all 
these regions are in our list) obtained by Parts et al. (201 1 ), 
who, for example, only call QTLs observed in both the pri- 
mary and the replica data sets. The existence of a higher 
number of regions under selection than previously re- 
ported is supported by the false positive rate identified from 
simulated data. 



Discussion 

Quantifying Selection from Time-Resolved Allele 
Frequencies 

We performed the first, single nucleotide resolution, 
genome-wide population genetic analysis of time series 
allele frequency data from an evolving outbred popula- 
tion under selection. The analysis revealed several insights 
into the dynamics of the population at the molecular level. 
First, we estimated that close to 6% of the over 30,000 
segregating sites are affected by the selection pressure as 
identified via deviation from a neutral null model. Sec- 
ond, we demonstrated the main force causing this motion 
to be genetic linkage, which causes passenger alleles to 
hitchhike with driver alleles. Third, using a driver and 
passenger model, we quantified the selective advantage 
of the found drivers, the amount of linkage disequilib- 
rium within the regions that they reside, and the uncer- 
tainty in their locations. The method we describe offers 
a first-order approach to examining selection in such an 
experiment. 

Extending the above, by allowing driver alleles to evolve 
arbitrarily, we found substantial statistical evidence of 
fitness effects going beyond those compatible with stan- 
dard additive models of selection, such that many of the 
driver alleles evolve under an effectively time-dependent fit- 
ness seascape (Mustonen and Lassig 2009). We next discuss 
possible scenarios underlying the observed liberal-driver 
dynamics. 



Linkage Pattern in the Initial Pool 
One explanation for the inferred liberal-drivers (or in- 
deed the standard drivers) signal would be the existence 
of linkage disequilibrium between candidate drivers. Such 
linkage could result from stochastic effects during the 
crossing, arising due to the finite population size, a sce- 
nario which equation (7) would not capture. Numeri- 
cal simulations (reported in supplementary text. Supple- 
mentary Material online) showed that population sizes 
from ~10^ upwards are sufficient to rule out distortion 
to the global selection statistics arising from such noise, 
on the assumption that the numbers of segregating sites 
and drivers, driver selection strengths, and recombina- 
tion rates are close to those inferred from the real data. 
Furthermore, analysis of data from a biological replicate 
gave overall statistics of selection consistent with those 
inferred, supporting the conclusion that drift-generated 
linkage disequilibrium is not causing the signal (see sup- 
plementary text. Supplementary Material online, for more 
details). 

From our simulations, we do note, however, that a large 
number of false positives would be generated due to drift- 
generated linkage in population of size 10^. Therefore, in 
smaller populations, our method should be applied only 
for identifying the largest fitness effects to avoid being 
swamped by false positives (which would reflect mistaken 
measurements of selection rather than any underlying fit- 
ness land- or seascape) or in combination with full sequence 
data from the initial pool to fix the linkage. The population 
size required to correctly decide whether a marker moved 
due to linkage to a nearby selective sweep or just due to drift 
has been calculated (Logeswaran and Barton 2011). 

Another possible reason for a nontrivial linkage pattern 
in the initial pool could come from unknown selection pres- 
sures during the crossing protocol. Given that such selection 
could in principle be arbitrarily complex, our analyses of the 
data here remain to some extend vulnerable to this. How- 
ever, there is no statistical evidence for interchromosomal 
pairwise linkages between genotype data of 19 QTL loci (all 
part of our candidate driver foci list) from 960 individuals 
at time point {tj + t^)/! as reported in Parts et al. (2011). 
Therefore, worst-case scenarios where all but one of the 
candidate drivers were linked to, and thus would merely 
hitchhike with, one of the two nearly fixing drivers are not 
supported by the data. Similarly to nonlocal linkage dise- 
quilibrium generated by small populations, possible effects 
of selection during the crossing to allele dynamics under 
the artificial selection phase could be easily included to the 
present method given suitable data. Extending the method 
to cover simultaneous recombination and selection is pos- 
sible but remains a topic for further investigation. 

Clonal Interference 

In asexually evolving populations, recombination cannot 
combine beneficial mutations that are on different haplo- 
types. This leads to so-called clonal interference where some 
beneficial mutations will be removed from the population 
due to them being interfered with stronger ones in different 
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backgrounds (Fisher 1930; Muller 1932). The dynamics of 
clonal interference are complex and both its experimental 
and its theoretical study has long been an active research 
topic (for recent reviews see Park et al. 201 0; Sniegowski and 
Gerrish 2010). 

For the results presented here, the large number of 
drivers make it unlikely to have a large number (or indeed 
any) individuals with all the beneficial alleles in a population 
of size ~ 10^-10*, so that such interference seems inevitable. 
However, simulation and sequencing data both suggest that 
clonal interference does not account for the liberal-driver 
signal observed here. 

Analyzing simulated data with the number of drivers (ad- 
ditive in fitness) and their strengths comparable to those 
inferred from the real experiment, we concluded that the 
observed liberal-driver dynamics does not reflect effects 
primarily caused by clonal interference. We reached this 
conclusion by analyzing the simulated data under both 
standard and liberal-driver models (see supplementary text, 
Supplementary Material online). With both models, we 
were able to discover the "true" driver set and reproduce the 
correct selection strengths for the drivers. However, using 
a liberal-driver model to infer selection from the simulated 
data, we did not see the decrease in fitness flux observed 
for the real system (contrast fig. 5 and supplementary fig. 
SI 4, Supplementary Material online). For this reason, we be- 
lieve that clonal interference is not the primary explanation 
for the stronger performance of the liberal-driver model for 
most candidate driver regions in the real data. Supporting 
this conclusion, genotyping data of 24 loci from 960 indi- 
viduals from time point {ti + h)/! showed that no sin- 
gle genotype (clone) dominated the population at that time 
(maximum count observed was 6), a considerable diversity 
of 787 unique haplotypes being detected and the data be- 
ing consistent with the expectation under a hypothesis of 
independent sites (Parts et al. 201 1 ). 

Given the absence of recombination, it is inevitable that 
clonal interference would, after sufficient time had passed, 
have a significant influence on the evolution of driver al- 
lele frequencies. However, given additive drivers of the 
size and number that we find from the real system, that 
point would not have been reached by the end of the 
experiment. 

Clonal interference between (possible) multiple benefi- 
cial alleles within a driver focus remains a possibility but 
would not stop the dominant combination of these mu- 
tations from fixing and as such is not consistent with 
the liberal-driver signal. Extension of the method to cover 
multiple drivers will be required to study such effects and is 
beyond the scope of this manuscript. 

Epistasis and Other More Complex Selective Scenarios 
Given that linkage or clonal interference between driver 
foci do not satisfactorily explain the observed dynamics, we 
believe that the liberal-driver signal is likely to be caused 
by fitness effects going beyond additive models of selec- 
tion. For instance, one such complex selection scenario 
could be that of epistatic interactions between drivers 



(Weinreich etal. 2005), encompassing combinations of pairs 
or potentially multiple drivers, while the possibility of gen- 
uinely time-dependent or frequency-dependent selection 
cannot be definitively ruled out. Models could be developed 
to examine the likelihood of each of these potential scenar- 
ios. However, inference of a correct model based on the data 
available presents a substantial challenge, such that the pre- 
cise nature of the observed time dependency remains to be 
shown. 

Comparable Events at Macroevolutionary Timescales 
The adaptive dynamics studied here consisted of driver al- 
leles at loci under strong selection for heat resistance and 
neutral passengers alleles linked to the drivers. As can be 
seen in the frequency profiles in figures 2 and 4 and sup- 
plementary fig. S5, Supplementary Material online, genomic 
regions around the drivers have substantially reduced al- 
lelic variance. The experiment and the observed dynamics 
took place over timescale of days. Nevertheless, a similar 
pattern of the hitchhiking effect has been described theo- 
retically (Kim and Stephan 2002) and found, among others, 
in the case of Drosophila, where individual selective sweeps 
in the recent past leave remnant genomic valleys of reduced 
variability (Svetec et al. 2009; Macpherson et al. 2007). How- 
ever, while in the case of selective sweeps, variability is re- 
moved completely from the genomic locality, in many of the 
cases studied here, a substantial degree of variance remains 
in the immediate vicinity of the sweep locus after the adap- 
tive event: The genomic signature resembles that of a soft 
sweep (Hermisson and Pennings 2005), where the beneficial 
driver allele is initially linked to many different background 
genotypes. 

From Driver Locations to Biology 
In the data considered here, the average spatial resolution 
which can be attained in identifying driver locations is about 
1 2 kb (see fig. 6). As such, identification of individual variants 
responsible for the inferred fitness advantages would re- 
quire further experimental or bioinformatic analysis, such 
downstream analysis being very important in terms of the 
biological insight that can be collected from this data. Anal- 
ysis of some of the large events observed in the data has been 
carried out in earlier work (Parts et al. 201 1 ), showing them 
to be biologically interesting with respect to heat tolerance. 
While mapping the precise locations of the drivers are im- 
portant for the biology of the trait in question, we note that 
inference of selection as was done here is not sensitive in 
this respect and thus, the resulting estimates for selection 
are robust. 

Conclusions 

Our analysis here has been of time-resolved allele frequency 
data from a yeast population under heat stress. However, 
the method presented has potential application to a range 
of organisms under a variety of selection pressures. Sev- 
eral conditions were necessary for our analysis to be sensi- 
ble: efficient recombination during the crossing to break up 
linkage and create variation, a large population size and high 
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initial allele frequencies to reduce the effect of genetic drift 
and clonal interference, the duration of the experiment, 
which does not allow de novo mutations to start interfering, 
and finally, the application of strong selection. The method 
can be extended to sexually propagated populations by con- 
sidering selection and recombination simultaneously, and in 
such a setting, the key conditions enumerated above will be- 
come somewhat more lenient. 

The focus on quantifying selection and utilizing multi- 
ple consecutive time points distinguishes our method from 
a standard QTL mapping design, although such a mapping 
would be a natural biological application. We see the com- 
bination of the experiment with multiple time points and 
the approach presented here as a model for the quantita- 
tive study of the fitness effects of mutations and their pos- 
sible interactions. Time resolution is key to our analyses; if 
data had been collected only from the initial and final pool, 
we would not have been able to show that majority of the 
drivers are not compatible with a standard model of addi- 
tive selection. Under the framework of experiment and data 
analysis described here, alterations to study other variables 
would be possible; for example, the effect of drift could be 
examined through modulating the population size. 

With the imminent arrival of deep population samples of 
time-resolved full genome sequences from evolution exper- 
iments, the possibility emerges of scaling up our measure- 
ments of the fitness effects of mutations to include complex 
multilocus interactions. Application of population genetic 
theory will be central to making the best possible use of such 
data. 

Supplementary Material 

Supplementary text, figs. S1-S14, and table SI are available 
lit Molecular Biology and Evolution online (http://www.mbe. 
oxfordjournals.org/). 
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